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remarkable accuracy rate of 99.6% for the VGG-16 net model, while VGG- 
19 net achieves a 100% accuracy rate. Based on these findings, it can be 
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images of grapevine leaves compared to the VGG-16 net. 
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1. INTRODUCTION 

In the field of image classification has experienced significant growth and has become increasingly 
popular among technology developers in recent times. This growth can be attributed to the rapid increase in 
data volume across various industries such as electronic commerce, automotive, medical care, and playing 
games [1], [2]. Classification is a methodical process of organizing entities into distinct groups and categories 
based on their inherent features. Image classification emerged as an endeavor to bridge the disparity between 
computer vision and human vision, accomplished by training computers using relevant data. This task 
involves segregating images into predetermined categories based on their visual content [3], [4]. Machine 
learning represents a critical facet within the domain of artificial intelligence. Despite its trajectory of over 
five decades of development, certain challenges remain unresolved. These challenges include intricate image 
comprehension and recognition, natural language translation, as well as recommendation systems [5], [6]. 
Deep learning constitutes a significant offshoot founded on the principles of machine learning. The approach 
leverages the hierarchical nature of artificial neural networks and biological neural systems to analyze data, 
where it acquires high-level features through the integration of low-level features using feature combination 
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methodologies. This capability enables the successful accomplishment of tasks such as image categorization 
or prediction. Unlike conventional machine learning, deep learning leverages multi-layered neural networks to 
autonomously learn from images and extract intricate underlying features present within the images [7], [8]. 
Among the prevailing deep neural network architectures, (convolutional neural networks (CNNs) or ConvNets) 
stand as one of the most widely embraced. CNNs perform convolutions of acquired features with input data, and 
their utilization of 2D convolutional layers renders them highly adept at processing two-dimensional data, 
notably images. This architecture obviates the requirement for manual feature extraction, sparing the need to 
explicitly identify features employed in image classification. Instead, CNNs operate by directly extracting 
features from the images themselves [9], [10]. This research employs two distinct deep learning 
methodologies, namely VGG-16 and VGG-19, for the classification of grapevine leaf images, subsequently 
predicting their correct class. A sizable dataset comprising thousands of images serves as the input data for 
this study. The investigation focuses on analyzing and comparing the accuracy achieved during the ‘train’ 
sessions at various percentage levels. 

Following is an outline of the article: the related works are included in section 2. Section 3 included the 
existing materials and methods utilized in the proposed grapevine leaves images classification. In section 4, the 
layout of the proposed grapevine leaves images classification is described. Section 5 covers the findings. 
Section 6 contains its conclusions. 


2. RELATED WORK 

An extensive review of relevant literature used in this investigation is provided in this section. 
Diago et al. [11] provided a novel approach for analyzing leaf area and yield in color photos while 
characterizing the grapevine canopy. The approach is predicated on creating a supervised classifier using the 
Mahalanobis distance. It automates the processing of image sets and computes the areas (in terms of how 
many pixels there are). The initialization of each class relies on user input, wherein the user selects 
representative pixels to serve as clustering anchors. The segmentation outcomes demonstrate impressive 
performance, with 92% accuracy for leaf identification and 98% accuracy for cluster detection. The 
simplicity of the image acquisition setup and the precise definition of pixel classes make this method robust 
and well-suited for providing valuable information for vineyard management. 

Pereira et al. [12] proposed a novel method for evaluating the effectiveness of transfer learning and 
fine-tuning methods using the AlexNet, with a specific focus on grape variety identification. This study 
involves two distinct vineyard image datasets collected from different geographic areas. The process of 
generating diverse datasets for training and classification involved the utilization of various image processing 
techniques, among which a warping method utilizing four image corners was employed. By applying the 
transfer learning scheme based on AlexNet to the image dataset pre-processed, a promising accuracy rate 
achieved was 77.30%. Additionally, when this classifier model was utilized, an impressive accuracy achieved 
on the well-known Flavia leaf dataset was 89.75%. 

Koklu et al. [13] suggested a classification technique to classify grapevine leaves images using deep 
learning techniques. Initially, 500 vine leaf images from 5 diverse classes were captured using a specialized 
self-illuminating system. Data augmentation methods were then employed to expand this dataset, resulting in 
a total of 2500 images. For the classification task, a modern CNN model called MobileNetv2 was tuned. 
Three distinct approaches were explored: In the first approach, classification was directly performed using 
the tuned MobileNetv2 model. In the next method, features were taken from the improved, and process of 
classification was done utilizing several support vector machine (SVM) classifiers. Finally, 1000 features 
were taken out of MobileNetv2 and chosen using the Chi-Squares method. These features were then reduced 
to 250 through dimensionality reduction. The classification was subsequently performed using various SVM 
kernels based on these chosen features. The Chi-Squares approach was found to be the most effective at 
extracting features from MobileNetv2’s logits layer and then reducing those features. Remarkably, this 
approach achieved a classification success rate of 97.60%. 

Zhang et al. [14] offered a deep learning- method entitled YOLOv5-CA, designed to attain an 
optimal balance between grape downy mildew (GDM) detection accuracy and processing speed in normal 
circumstances. The approach incorporates a coordinate attention (CA) mechanism into the YOLOv5 
architecture, effectively highlighting visual features relevant to downy mildew disease, thereby enhancing the 
detection performance. To assess the efficacy of the proposed approach, the challenging GDM dataset was 
acquired from a vineyard under various natural scenes, encompassing diverse lighting, shadows, and 
backgrounds. According to the findings, YOLOv5-CA had excellent detected precision of 85.59%, recall of 
83.70%, and mAP@0.5 of 89.55%. These performance metrics outperform those of well-known techniques 
such as faster R-CNN, YOLOv3, and YOLOvS. The suggested method also has excellent inference speed, 
processing at 58.82 frames per second, making it appropriate for real-time disease control requirements. 
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Ahmed et al. [15] suggested a (CNN)-based model tailored for grape leaf classification by adapting the 
DenseNet201 architecture. A primary focus of this investigation is to assess the influence of layer freezing on 
the performance of DenseNet201 during the fine-tuning process. To conduct this research, a publicly available 
dataset comprising 500 images, encompassing five distinct classes with 100 images per class was utilized. 
To expand the training set, various data augmentation techniques were employed. The proposed CNN model, 
named DenseNet-30, demonstrated superior performance compared to existing works on grape leaf 
classification, where the dataset was originally sourced. The DenseNet-30 achieved an impressive overall 
accuracy of 98%, underscoring its effectiveness in accurately classifying grape leaves. 


3. IMATERIALS AND METHODS 

This section explains the materials and methods used in the proposed system. Section 3.1 explain 
the dataset that was used in the proposed system. Sections 3.2 and 3.3 describe the deep learning techniques 
used in the Grapevine Leaves classification process such as VGG-16 and VGG-19, respectively. 


3.1. Grapevine leaves image dataset 

In this study, we used the freely available dataset that we acquired from [13]. This collection 
includes grapevine leaf samples from five different species: “Ak,” “Alaldris,” “Buzgulu,” “Dimnit,” and 
“Nazli.” For every species, there are 100 photos, each with 512x512 pixel dimensions. Consequently, the 
aggregate number of images utilized in this study amounts to 500. Notably, the acquisition of these images 
was facilitated through the use of a specialized automatic illumination system. Figure | presents a representative 
sample from each class. 


” 


Figure 1. A sample of each class of grapevine leaf 


3.2. VGG-16 net 

In this investigation, we employed the publicly the VGG-16 architecture represents a deep (CNN) 
structure, where the number “16” denotes the presence of 16 layers, encompassing both convolutional and 
fully connected layers. This design is characterized by the utilization of compact 3x3 convolutional filters 
and deep architectures with a stride size of 1. The pooling layers adopt a 2x2 configuration with a stride size 
of 2 and maintain the same padding. By default, the VGG-16 network processes input images of size 
224x224. Preceding the fully connected layers, a 7x7 feature map containing from 512 channels is employed. 
Subsequently, this feature map is transformed into a vector with 25,088 channels (7x7x512) as the resulting 
feature representation [16], [17]. Figure 2 refers to the structure of VGG-16 net [18]. 
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Figure 2. The architectural configuration of the VGG-16 net model [18] 
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3.3. VGG-19 net 

VGG-19 is a feed forward network constituted of 19 layers, arranged in a sequential manner. 
The network uses only 3x3 convolutional filters throughout the whole architecture, which makes it 
computationally more efficient than architectures that use larger filters. The first 16 layers of VGG-19 are 
convolutional layers, and they are separated into five blocks, each of which comprises several convolutional 
layers then followed by a max-pooling layer. As the network progresses deeper, the quantity of filters within 
each block undergoes increment. The convolutional layers are responsible for extracting salient features from 
the input images, whereas the fully connected layers classify images using the retrieved features. 
Additionally, the use of max-pooling layers reduces the dimensionality of the features and lessens the 
possibility of overfitting. A visual representation of the architectural arrangement of the VGG-19 net model 
is shown in Figure 3 [19], [20]. 
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Figure 3. The architectural configuration of the VGG-19 net model [19] 


4. PROPOSED SYSTEM 

In this study, we proposed a grapevine leaf image classification system based on using VGG-16 and 
VGG-19 deep learning nets. It has six main consecutive stages; (i) color grapevine leaf images dataset 
loading, (ii) resizing the image, (iii) loading the deep learning nets, (iv) training options for the deep learning 
nets, (v) tested image from the inside dataset, and (vi) tested image from the outside dataset. Figure 4 
illustrates the structure of the proposed grapevine leaves image classification system. 
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Figure 4. The proposed grapevine leaves images classification layout 
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4.1. Color grapevine leaves images dataset loading 

This stage represents the first stage of the proposed system. It involves the process of reading and 
loading a collection of color images of grapevine leaves into memory. Also, enables further processing, 
analysis, or machine learning tasks. 


4.2. Resize the image 

This stage refers to the process of changing its dimensions, making it larger or smaller. When you 
resize an image, you are essentially altering the number of pixels it contains, either increasing or decreasing 
its width and height. Also, the image is resized to 224x224. 


4.3. Loading the deep learning nets 

This stage refers to the process of loading pre-trained neural network models into memory to make 
predictions or further train them on new data. Deep learning networks are typically trained on large datasets 
for specific tasks such as image classification. The loading process is done using MATLAB program and two 
pre-trained networks are loading named VGG-16 and VGG-19. 


4.4. Training options for the deep learning nets 

Once trained, the models are saved in a specific format that allows them to be reused later without 
the need for retraining. During training, the model’s parameters are optimized to reduce a selected loss 
function, which determines the variation between expected and actual outputs. After training, the model’s 
architecture and learned parameters (weights and biases) are saved to disk in a serialized format. 


4.5. Tested image from inside dataset 

A tested image generally refers to an individual image or data point that is employed to assess the 
effectiveness of a trained deep learning model. During testing, the model generates an output using the tested 
image as input (e.g., a predicted class label for image classification tasks). The model’s predictions are then 
compared to the ground-truth labels of the test set to determine the effectiveness of measurements like: 
accuracy, precision, and recall. These metrics make it easier to assess the model’s effectiveness in completing 
the assigned task by providing useful information about how it performs on data that hasn’t been seen before. 


4.6. Tested image from outside dataset 

Typically refers to an image that is not part of the original dataset used for training and evaluating a 
deep-learning model. Instead, it comes from a completely different dataset that the model has never 
encountered during its training phase. It means that we take an image from an entirely different dataset that 
was not used for training, or testing. This simulates how the model would perform on entirely new, unseen 
data from a different source or distribution. 


5. RESULTS AND DISCUSSION 

Four primary criteria are used to assess the efficiency and accuracy of the system that is suggested, 
namely accuracy, precision, recall, and specificity. Accuracy pertains to the proportion of correct estimates 
achieved by the system as the (1) [21]. Precision measures the ratio of real positive detections to all positive 
detections, as the (2) [22]. On the other hand, recall measures the number of accurately found ground truth 
annotations, as the (3) [23]. Lastly, specificity quantifies the proportion of accurately identified negative 
values, as the (4) [24], [25]. 


Accuracy = (TP +TN)/(TP+TN+FP+FN) (1) 
Precision = (TP)/(TP + FP) (2) 
Recall = (TP)/(TP + FN) (3) 
Specificity = (TN)/( TN + FP) (4) 


The effectiveness of the indicated grapevine leaves images system utilized VGG-16 and VGG-19 
deep learning nets are evaluated using the dataset described in 3.1. The dataset was divided into two sections: 
training and testing. Training data made up 80% of the whole dataset while testing data made up 20% of the 
total dataset. The evaluation of the proposed system’s performance entails the training of two distinct 
networks, namely the VGG-16 net and the VGG-19 net. The classification process commences by subjecting 
the dataset images to all stages of our suggested model, facilitating training and accuracy assessment. Table 1 
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presents the outcomes obtained from the assessment of the proposed model using the aforementioned 
metrics. The performance results for VGG-16 net are as follows: accuracy 99.6%, precision 99.00%, recall 
00.00%, and specificity 99.75%. Conversely, for the VGG-19 net, the performance metrics achieved a perfect 
score of 100.00% across all metrics. According to the test findings, it is evident that the VGG-19 net 
outperforms the VGG-16 net. Consequently, the VGG-19 net is deemed superior in its ability to accurately 
classify grapevine leaf images. Figure 5 and Figure 6 present the precision, recall, and specificity results for 
each class within the dataset for VGG-16 net and VGG-19 net, respectively. 


Table 1. The result of grapevine leaves images classification 
Model Accuracy (%) Precision (%) Recall (%) Specificity (%) 
VGG-16 net 99.6 99.00 99.00 99.75 
VGG-19 net 100 100 100 100 
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Figure 5. The results obtained for each class when utilizing the VGG-16 net 
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Figure 6. The results obtained for each class when utilizing the VGG-19 net 


Figure 7 and Figure 8 show the use of four different test images randomly selected from inside and 
outside the dataset to assess the efficacy of the suggested model. From the results, it was found that the 
accuracy of the first image from each figure selected from inside the dataset is equal to 100, while the 
accuracy decreases for the rest of the images from outside the dataset, depending on the strength of the 
network in the classification process. Also, these figures show the efficiency of the VGG-19 net in 
classifying images from outside the dataset, so that the accuracy of the three test images in the case of the use 
of VGG-16 net are 95.373%, 56.209%, and 50.022%, respectively, while the accuracy of the three test 
images in the case of the use of VGG-19 net are 99.936%, 98.352%, and 71.683%, respectively. Figure 9 
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illustrates the comparative analysis between the VGG-16 net and VGG-19 net concerning their accuracy, 
precision, recall, and specificity. Notably, the results highlight the superior performance of the VGG-19 net 
in the image classification task, underscoring its efficacy in this context. 


Test-l: AK=100 Test-2: AK=95.373 Test-3:AlaIdris =56.209 Test-4: AK=50.022 
from Inside the Dataset from Outside the Dataset from Outside the Dataset from Outside the Dataset 


Test-1: AK=100 Test-2: AK=99.936 Test-3: Buzgulu=98.352 Test-4: AK=71.683 
from Inside the Dataset from Outside the Dataset from Outside the Dataset from Outside the Dataset 


= Accuracy & Precision © Recall = Specificity 


VGC-16 Net 


Figure 9. Comparison between VGG-16 Net and VGG-19 Net 


Table 2 presents a comparative analysis of our proposed strategy alongside previous studies, 
revealing its superior performance over alternative approaches. The results demonstrate that our suggested 
approach achieves higher levels of evaluation compared to earlier investigations. Consequently, the efficacy 
of our proposed system has been empirically validated. 


Table 2. Comparison of our outcomes with results from earlier experiments 


Ref. Accuracy (%) Precision (%) Recall (%) Specificity (%) 

[12] 89.75 - - - 

[13] 97.60 - - - 

[14] - 85.59 83.70 - 

[15] 98.00 - - - 
VGG-16 net 99.6 99.00 99.00 99.75 
VGG-19 net 100 100 100 100 
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6. CONCLUSION 

In conclusion, the use of the deep learning models VGG-16 and VGG-19 for the categorization of 
photos of grapevine leaves has shown to be very successful and promising. It was demonstrated that both 
models could handle challenging picture identification tasks by their strong performance in correctly 
classifying grapevine leaves into different groups. Regarding overall accuracy and generalization, the deeper 
architecture of the VGG-19 model gave it a slight advantage over the VGG-16 model. However when 
compared to traditional machine learning methods, both models showed notable gains, underscoring the 
effectiveness of deep learning for picture categorization problems. Based on the data, the VGG-16 net 
obtains an accuracy rate of 99.6%, whereas the VGG-19 net achieves a 100% accuracy rate. We can state that 
the VGG-19 net performs better than the VGG-16 net in the classification of images of grapevine leaves 
based on this comparison study. 
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